Display location calculation means

ABSTRACT

Display location calculation means and methods for calculating a display location are disclosed. It has become common for users to indicate a point on a display in order to communicate with a machine. It is therefore necessary to be able to calculate the location on the display that is being indicated by the user. A display generator ( 123 ) is arranged in operation to generate a display in response to display data generated by a machine ( 111 ). A camera operable to generate image data representing at least part of the scene within the field of view of the camera (which part, includes at least a portion of the display) is carried in a pointer device ( 103 ). Computation means ( 111, 113, 115 ) are arranged in operation to receive the image data and the display data and to calculate from these data sets the position and/or orientation of the pointer device ( 103 ) relative to the display. A display location can then be calculated from the calculated position and/or orientation.

This application is the US national phase of international applicationPCT/GB02/05846 filed 20 Dec. 2002 which designated the U.S. and claimsbenefit of EP 01310832.9, dated 21 Dec. 2001, the entire content ofwhich is hereby incorporated by reference.

This invention relates to display location calculation means and tomethods for calculating a display location for use particularly, but notexclusively, with computers.

Users often need to communicate with machines and sometimes they need tocontrol them too. It has become increasingly common for this to be doneby the user indicating a point on a display. It is therefore necessaryto be able to calculate the location on the screen that is beingindicated by the user. Often, the display also informs the user of thestatus of the machine.

An early example of display location calculation means for a computerused a cathode ray tube (CRT) screen for status display and a light penfor indication. Such light pens housed a photodetector which produced asignal on detecting a sharp increase in received light intensity. Thatsignal was sent to the computer driving the CRT display. To indicate apoint on the display the user pressed the tip of the light pen againstthat point. By monitoring the timing of the passage of the electron beamfrom the cathode ray tube, the computer was able to calculate the pointof contact, i.e. the display location.

In recent years, devices which do not require contact with the screenhave become popular. By far the most common means for indicating a pointon a display is the computer mouse. However, a computer mouse requires aflat, non-slip surface for use and can collect dirt from that surfacewhich then spoils its operation over time.

Non-contact devices are used for indicating a point on a display in somespecialised interfaces. Examples include light guns used as video gamecontrollers. Light guns operate in the same way as light pens but at adistance from the display. Non-contact computer mice are also available.None of these non-contact devices are entirely satisfactory however.

According to a first aspect of the present invention there is provideddisplay location calculation means comprising:

a display generator arranged in operation to generate a display inaccordance with display data generated by a machine;

-   a pointer device carrying a camera operable to generate image data    representing at least part of the scene within the field of view of    the camera, which part in use, includes an image of at least a    portion of the display;-   computation means arranged in operation to:-   receive said image data;-   receive said display data;-   calculate, from said image data and said display data, the position    and/or orientation of said pointer device relative to said display;-   calculate a display location from said calculated position and/or    orientation.

By obtaining display data representing a display and image datarepresenting the display as viewed by a camera carried by the pointerdevice, the position and/or orientation of the pointer device relativeto the display can be calculated. Hence a display location can becalculated. The pointer device does not require complex, mechanicalparts, works with different display generator technologies (e.g. CRT,liquid crystal display (LCD), plasma screens etc.) and need not beplaced in contact with the screen or any other surface in use.

Preferably the pointer device is elongate in shape and the displaylocation is the point where the longitudinal axis of said pointer deviceintersects with the display. In preferred embodiments a cursor isincluded in the display at the display location; the position of thecursor varies in accordance with the calculated position and/ororientation of the pointer device relative to the display. In this wayit is easier for the user to see the point which they are indicating.

Once a user has control of a cursor which is included in a display it isdesirable to be able to control the machine. Nowadays, most machinesprovide a user interface arrangement to enable a user to control theiroperation. Normally, an interface will include some means for inputtinguser commands and some means for indicating to the user the status ofthe machine responsive to those commands. A user can input differentcommands to the machine by indicating different parts of the display.

According to a second aspect of the present invention there is thereforeprovided an interface arrangement for providing an interface between auser and a machine comprising:

-   display location calculation means according to the first aspect of    the present invention wherein said computation means is further    arranged in operation to control said machine in accordance with the    position of the cursor.

In this way, a user can control a machine without needing to learn anyspecial manipulative skills. Furthermore, the need for lots of remotecontrol buttons and the need for strong integration of remote controlhardware and interface design found in some interface arrangements isreduced or obviated altogether. Many different types of machine providesuch an interface including, for example, personal computers, digitaltelevision receivers, video cassette recorders (VCR) and digital videodisk (DVD) players.

Preferably the machine comprises a computer having a processor whereinthe computation means comprises the processor. By using the computer'sprocessor to carry out the necessary calculation, the requirement forprocessing power elsewhere in the interface arrangement is reduced orobviated altogether. Alternatively the computation means could alsocomprise more than one processor, placed alone or together, in thepointer device or remote from it.

Preferably the camera is a digital video camera since the captured imagewill then be in the same format as the display image and hence readilyavailable for image registration (a process to determine the parametersof the transformation required to bring an image into alignment withanother image). If the camera were a fixed focus camera with a verysmall depth of field, the image would only be in focus at a certaindistance from the screen. When a focussed image is obtained the distanceto the screen could be known. Hence the processing required in thecalculation will be reduced. In order to keep the calculation simple, inpreferred embodiments, the camera has a fixed spatial relationship withthe pointer device.

According to a third aspect of the present invention there is provided amethod of calculating a display location comprising the steps of:

-   i. generating a display in accordance with display data generated by    said machine;-   ii. capturing image data representing at least part of the scene    within the field of view of a camera carried by a pointer device    wherein at least a portion of said display is included in said field    of view;-   iii. calculating from said image data and said display data the    position and/or orientation of said pointer device relative to said    display;-   iv. calculating a display location from said calculated position    and/or orientation.

By comparing display data generated by a machine and representing adisplay with image data representing the display as viewed by a camerawhich is part of a pointer device, the position and/or orientation ofthe pointer device relative to the display can be calculatedirrespective of the way in which the display is refreshed. Based on theresults of the calculation, a display location can then be calculatedenabling a user to indicate a point on a display.

In preferred embodiments the history of the position and/or orientationof said pointer device relative to said display is used as an additionalvariable in the position and/or orientation calculation step. This ispreferred because starting with a good estimate of the correct positionand/or orientation will reduce the processing required to calculate thecurrent position and/or orientation.

According to a fourth aspect of the present invention there is provideddisplay location calculation means including:

-   a storage medium having recorded therein processor readable code    processable to provide an interface between a user and a machine    said code comprising:-   display data acquisition code processable to obtain display data    representing a display image data acquisition code processable to    obtain from a pointer device carrying a camera, image data    representing at least part of the scene within the field of view of    said camera;-   position/orientation calculation code processable to calculate from    said display data and said image data the position and/or the    orientation of said pointer device relative to said display-   display location calculation code processable to calculate from said    calculated position and/or orientation a display location.

According to a fifth aspect of the present invention there is provided adigital data carrier carrying a program of instructions executable byprocessing apparatus to perform the method steps as set out in the thirdaspect of the present invention.

Embodiments of the present invention will now be described, by way ofexample only, with reference to the accompanying drawings, wherein likereference numbers refer to like parts, and in which:

FIG. 1 illustrates a user operating a pointer device in accordance withan embodiment of the present invention;

FIG. 2 a is a perspective view of the pointer device;

FIG. 2 b is a cross-sectional representation of the pointer device;

FIG. 3 a illustrates a display-based, 3D, right-handed, rectangularcoordinate system;

FIG. 3 b illustrates an image-based, 2D, right-handed, rectangularcoordinate system;

FIG. 3 c is a diagram illustrating the projection from a camera-basedcoordinate system to an image-based coordinate system;

FIG. 4 is a flow diagram illustrating the operation of the pointerdevice control process;

FIG. 5 a is a flow diagram showing in greater detail the stage in themapping parameter establishment step of FIG. 4 that transforms andprojects the display in the display-based coordinate system to form atest image in the image-based coordinate system;

FIG. 5 b is a flow diagram showing in greater detail the stage in themapping parameter establishment step of FIG. 4 that compares the testimage and camera image in order to establish the actual mappingparameters;

FIG. 5 c is a diagram showing the filter windows and filter step size ofthe camera image;

FIG. 6 is a flow diagram showing in greater detail the steps of FIG. 4that calculate the intersection point and position the cursor.

FIG. 1 shows a user 101 operating a pointer device 103 connected viacables 105 a/105 b to a Universal Serial Bus (USB) hub 107 which itselfis connected to the USB port 109 of a computer 111. The pointer device103 (FIGS. 2 a and 2 b) comprises a housing 201 elongate in shape andarranged and sized to be held in the hand of a user. The housing has awindow 203 at a forward end and a rear wall 205 at the rearward end. Thehousing 201 houses a camera 207 (with centre of aperture 209) which islongitudinally oriented so as to have a view through the window 203 atthe forward end of the housing 201. A suitable camera is the LogitechQuickcam® VC (USB version) although any digital video camera with a USBport, being sufficiently small to be placed inside such a housing, willsuffice. The focussing mechanism of the camera is fixed so as to cause afocussed image of an object to be created on the CCD detector array whenthe object is 300 mm from the camera. Light emitting diode (LED) 210 isprovided to indicate to the user that the captured image is in focus.Someone skilled in the art would have no problems in implementing asuitable algorithm to provide this feedback.

The electronic signal output by the camera 207 representing the viewthrough the window 203 is transmitted to the computer 111 via one of theUSB cables 105 a. The cable 105 a leads from the camera 207 through therear wall 205 of the housing 201 and terminates in a USB connector whichis attached in use to the USB hub 107. The USB hub 107 is plugged in useinto the USB port 109 of the computer 111. The camera 207 draws itspower from the USB connection.

Two buttons 211 a/213 a are located on each lateral side of the housing201 towards its forward end. Two switches 211 b/213 b are providedinwardly of each button 211 a/213 a and are controlled by the depressionof the button 211 a/213 a. A wire leads from each switch 211 b/213 b toa circuit board 215 positioned at the rearward end of the housing 201.The state of the switches 211 b/213 b is transmitted to the computer 111via the other USB cable 105 b which leads from the circuit board 215through the rear wall 205 of the housing 201 and terminates in a USBconnector which is attached in use to the USB hub 107.

Referring once again to FIG. 1, the computer 111 comprises a centralprocessing unit (CPU) 113 and a storage system 115. The storage system115 comprises:

-   a) a random access memory 117;-   b) a first storage device 119 (e.g. a hard disk) storing the CPU    operating program and image data;-   c) a second storage device 121 (e.g. a floppy disk drive or CD/DVD    drive) for reading data from and/or writing data to a removable    storage medium 123.

The computer further comprises a network card 125 for interfacing to anetwork.

The CPU 113 is in communication with the storage system 115. The outputof the computer 111 is displayed on a display screen 123. In the currentembodiment the display screen 123 is an LCD screen. It is possible,however, to use a CRT screen instead of the LCD screen. In such a caseit will be realised that a flashing effect similar to that seen whenCRTs are captured on television may result if the camera captures framestoo rapidly. It would therefore be necessary to extend the period ofcapture to a larger fraction of a second thus ensuring that the peaks ofthe CRT output are smoothed over.

Although FIG. 1 shows the pointer device 103 connected to the computer111 via cables it will be apparent that a wireless connection, such as aradio or infra-red connection, could be used allowing the user 101 morefreedom of movement. This would be particularly useful if the user 101was giving a presentation and needed to move around a stage area. Inthis case the camera would draw its power from rechargeable batteriesplaced within the housing. The batteries would be recharged by placingthe pointing device in an appropriately powered recharging bay. Thoseskilled in the art would have no difficulty in making the appropriatemodifications.

Different rectangular coordinate systems can be used to describe thereal world space occupied by the display screen, the camera and theimage formed inside the camera.

A display-based, 3D, right-handed, rectangular coordinate system (FIG. 3a) allows the position of a point within the real world space to bedefined by three coordinates [X_(s), Y_(s), Z_(s)]. The unit of distanceof the axes is mm. In accordance with the display-based coordinatesystem, the display screen 123 occupies a rectangular portion of theplane z_(s)=0. The x_(s)-axis runs horizontally along the top edge ofthe display from the origin (0,0,0) to the top right-hand corner(320,0,0) and the y_(s)-axis runs down the left-hand side of the displayto the bottom left-hand corner (0,240,0). Hence the display screen fillsan area of 320×240 mm.

The display screen offers VGA (video graphics array) (640×430 pixels)resolution. The mapping of the corners of the display screen (defined inthe display-based coordinate system) to display screen pixels(x_(sp),y_(sp)) is shown in the table below:

(x_(s), y_(s), z_(s)) (x_(sp), y_(sp)) (0, 0, 0) (0, 0) (320, 0, 0)(640, 0) (0, 240, 0) (0, 480) (320, 240, 0) (640, 480)

Hence it will be realised that the relationship between thedisplay-based coordinate system and the display-screen pixels is:(x _(s) ,y _(s) ,z _(s))

(½x _(sp),½y _(sp)) for 0≦x _(sp)≦640 & 0≦y _(sp)480  Relation 1:

A camera-based, 3D, right-handed, rectangular coordinate system (217FIG. 2 b) is defined by three coordinates [X_(c), Y_(c), Z_(c)]. In thecamera-based coordinate system, the centre of the camera lens system 209occupies the position (0, 0, 0) and the line of view of the camerapoints down the z_(c)-axis towards increasing values of z_(c). They_(c)-axis extends laterally in the direction towards the top of thecamera and the x_(c)-axis extends laterally and leftward of the camerawhen viewed from behind the camera. The unit of distance for the axes ismm.

Turning now to FIG. 3 c, an object point 305 in the field of view of thecamera viewed through the centre of the camera's lens system 209 formsan image point 307 on the CCD detector array inside the camera. Theobject point 305 is at a position (x_(co),y_(co),z_(co)) in thecamera-based coordinate system. The image point is on the plane z_(c)=−nwhere n is the distance between the centre of the camera lens system 209and the CCD detector array in mm.

An image-based, 2D, right-handed, rectangular coordinate system (FIG. 3b) is defined by two coordinates [X_(i), Y_(i)]. Again, the unit ofdistance of the axes is mm. In the image-based coordinate system theorigin (0,0) is at the centre of the CCD detector array. When viewedfrom behind the array, the x_(i)-axis runs horizontally and leftwardfrom the origin (0,0) to the point (1.96585,0) and the y_(i)-axis runsvertically and upward from the origin (0,0) to the point (0,1.6084).Hence the CCD detector array fills an area of 3.9317×3.2168 mm. Thepoint (x_(c),y_(c),−n) in the camera-based coordinate system isequivalent to the point (x_(i),y_(i)) in the image-based coordinatesystem.

The CCD detector array offers (352×288 pixels) resolution. The mappingof the corners of the array (defined in the image-based coordinatesystem) to image pixels (x_(ip),y_(ip)) is shown in the table below:

(x_(i), y_(i)) (x_(ip), y_(ip)) (−1.96585, −1.6084) (0, 0) (1.96585,−1.6084) (352, 0) (−1.96585, 1.6084) (0, 288) (1.96585, 1.6084) (352,288)

Hence it will be realised that the relationship between the image-basedcoordinate system and the image pixels is:(xi,yi)

(0.01117[x _(ip)−176],0.01117[y _(ip)−144]) for 0≦x _(ip)≦352 & 0 ≦y_(ip)≦288   Relation 2:

The general relation between the display-based coordinate system and thecamera-based coordinate system can be expressed by a rigid bodytransform. A rigid body transform is an arbitrary concatenation ofscale, translate and rotate transforms.

The x-coordinate of an object point 305 defined in the display-basedcoordinate system can be transformed into the x-coordinate of a pointdefined in the camera-based coordinate system according to the followingequation:x _(co) =m _(co) x _(s) +m ₀₁ y _(s) +m ₀₂ z _(s) +m ₀₃where m_(ab) are coefficients of the transformation.

Similarly for the y and z-coordinates:y _(co) =m ₁₀ x _(s) +m ₁₁ y _(s) +m ₁₂ z _(s) +m ₁₃z _(co) =m ₂₀ x _(s) +m ₂₁ y _(s) +m ₂₂ z _(s) +m ₂₃.

These three equations can be neatly combined into the following matrixequation:

$\begin{matrix}{{Relation}\mspace{14mu} 3\text{:}} \\{\begin{bmatrix}x_{co} \\y_{co} \\z_{co} \\1\end{bmatrix} = {T\begin{bmatrix}x_{s} \\y_{s} \\z_{s} \\1\end{bmatrix}}} \\{\mspace{65mu}{= {\begin{bmatrix}m_{00} & m_{01} & m_{02} & m_{03} \\m_{10} & m_{11} & m_{12} & m_{13} \\m_{20} & m_{21} & m_{22} & m_{23} \\0 & 0 & 0 & 1\end{bmatrix}\begin{bmatrix}x_{s} \\y_{s} \\z_{s} \\1\end{bmatrix}}}} \\{\mspace{65mu}{= \begin{bmatrix}{{m_{00}x_{s}} + {m_{01}y_{s}} + {m_{02}z_{s}} + m_{03}} \\{{m_{10}x_{s}} + {m_{11}y_{s}} + {m_{12}z_{s}} + m_{13}} \\{{m_{20}x_{s}} + {m_{21}y_{s}} + {m_{22}z_{s}} + m_{23}} \\1\end{bmatrix}}}\end{matrix}$where T is the transformation matrix.

It is also possible to derive a relationship between the camera-basedcoordinates of an object point within the field of view of the cameraand the image-based coordinates of the image of that object point formedon the camera's CCD detector array.

A triangle Q is defined in front of the camera lens system by thefollowing (FIG. 3 c):

-   1. The plane y_(c)=0 (which, it will be remembered, is a plane    bisecting the camera lengthways at right angles to the axis y_(c),-   2. The projection, onto the plane x_(c)=0 (a plane also bisecting    the camera lengthways at half the height of the CCD detector array),    of the line passing through the object point 305 and the centre of    the camera lens system 209,-   3. The plane z_(c)=z_(co) (a plane at right angles to the    longitudinal axis of the camera and passing through the object point    305).

A further triangle R, which is similar (in the relative length of itssides) to triangle Q, is defined behind the lens system by thefollowing:

-   4. The plane y₂=0 (as above),-   5. Line 2 above,-   6. The plane of the CCD detector array.

Since triangles Q and R are similar it follows that the relation betweenthe camera-based coordinate system and the image-based coordinate system(ignoring sign changes) can be expressed as:

$\frac{y_{co}}{z_{co}} = \frac{y_{i}}{n}$Likewise:

$\frac{x_{co}}{z_{co}} = \frac{x_{i}}{n}$Or by rearranging and taking sign changes into account:

$\begin{matrix}{{Relation}\mspace{14mu} 4\text{:}} & \; & {y_{i} = {{- n}\;\frac{y_{co}}{z_{co}}}} \\{{Relation}\mspace{14mu} 5\text{:}} & \; & {x_{i} = {{- n}\;\frac{x_{co}}{z_{co}}}}\end{matrix}$

The combination of these two relations and the above matrix equationdefine a relation:(x_(s),y_(s),0)

(x_(i),y_(i))mapping any point on the display which is in the field of view of thecamera to a corresponding point in the image formed on the CCD detectorarray.

The computer 111 (FIG. 1) operates under the control of softwareexecutable to carry out a display location calculation process. As willbe understood by those skilled in the art, any or all of the softwareused to implement the invention can be contained on various transmissionand/or storage media 123 such as floppy disc, CD-ROM or magnetic tape sothat it can be loaded onto the computer 111 or could be downloaded overa computer network using a suitable transmission medium. The displaylocation calculation process will now be described in greater detailwith reference to the flow chart in FIG. 4.

The display location calculation process begins with the acquisition(step 401) of display data generated on the display screen 123 by thecomputer 111 and storing it in the data constant memory 121. Then imagedata defining the latest camera image is acquired by the computer 111(step 403) and stored in the data constant memory 121. The display dataand camera data just acquired are then compared in order to derive thetransformation parameters (m_(ab) above) representing the transformationfrom the display-based coordinate system to the camera-based coordinatesystem (step 405). After that the parameters defining the 3Drelationship that defines the position and/or orientation of the camera207 (and hence the pointer device 103) relative to the display screen123 are calculated (step 407). Then the intersection point between thelongitudinal axis of the camera and the surface of the display screen123 is calculated (step 409). The software then controls the computer111 to add a cursor (step 411) within the display at the point ofintersection calculated in step 409. Thereafter, all the above steps401–411 are repeated in order to maintain fine control of the cursor bythe movement of the pointing device. The recalculation of the cursorposition should take place as often as possible using the spareprocessing power of the CPU however ideally it should take place asoften as with a standard computer mouse; i.e. about 40 times per second.

An optional extension to the above process involves a step 413 whichprovides the history of the position and/or orientation of the camerarelative to the display screen (in the form of previously usedtransformations) to step 405 which derives the parameters of the mappingfrom the display-based coordinate system to the camera-based coordinatesystem. This is preferred because starting with a good estimate of thecorrect position and/or orientation will reduce the processing required.

The display location calculation algorithm steps described above willnow be described in further detail.

Acquiring the display data (FIG. 4, step 401) involves acquiring thedigitally quantised intensity levels of the outputted pixels displayedon the display screen 123 in one or more colour bands c. An intensity(black and white) image has only one band whereas an RGB (truecolour)image has three colour bands; that is red, green and blue.

The display pixel intensity values in each colour band c are samples ofthe display intensity function S_(c)(x_(sp),y_(sp)). The number ofsamples depends upon the resolution of the display screen 123. Asdescribed above the display screen 123 offers VGA resolution (640×480pixels). The position of the display screen pixel (306,204) atdisplay-based coordinates (153,102,0) is shown in FIG. 3 a. The displayscreen pixel intensity value for the display screen pixel (306,204) inthe green band is the sample value of S_(g) (306,204).

The display screen pixel intensity values for all samples are stored asa 2D array referred to as S_(csample). For example, the above samplevalue of S_(g) (306,204) is stored in S_(gsample) at position (306,204).

The display pixel colour values are samples of the display colourfunction S_(colour)(x_(xp),y_(sp)) which is a function of the displaypixel intensity values in each colour band such thatS_(colour)(x_(sp),y_(sp))=S_(R)(x_(sp),y_(sp))S_(G)(x_(sp),y_(sp))S_(B)(x_(sp),y_(sp)).The display pixel colour values for all samples are stored as a 2D arrayreferred to as S_(colour). For example, the colour value for the displaypixel (306,204) is stored in S_(colour) at position (306,204).

Acquiring the image data (FIG. 4, step 403) involves acquiring thedigitally quantised intensity levels of the pixels of the image capturedon the camera's CCD detector array in one or more colour bands c.

The image pixel intensity values in each colour band, c, are samples ofthe function I_(c)(x_(ip),y_(ip)). The number of samples depends uponthe resolution of the CCD detector array. As described above the CCDdetector array offers (352×288 pixels) resolution. The position of theimage pixel (90,70) at image-based coordinates (−0.96062,−0.82658) isshown in FIG. 3 b. The image pixel intensity value for the pixel (90,70)in the red band is the sample value of I_(r)(90,70).

The image pixel intensity values for all samples are stored as a 2Darray referred to as I_(csample). For example, the above sample value ofI_(r)(90,70) is stored in I_(rsample) at position (90,70).

The image pixel colour values are samples of the image colour functionI_(colour)(x_(ip),y_(ip)) which is a function of the display pixelintensity values in each colour band such thatI_(colour)(x_(ip),y_(ip))=I_(R)(x_(ip),y_(ip))I_(G)(x_(ip),y_(ip))I_(B)(x_(ip),y_(ip)).The display pixel colour values for all samples are stored as a 2D arrayreferred to as I_(colour). For example, the colour value for the displaypixel (90,70) is stored in I_(colour) at position (90,70).

Referring to FIGS. 5 a and 5 b, the derivation of the transformationparameters representing the transformation from the display-basedcoordinate system to the camera-based coordinate system (FIG. 4, step405) will now be disclosed in greater detail. The purpose of thederivation of the mapping parameters is to determine the transform thatmost closely describes the relationship between the display-basedcoordinate system and the camera-based coordinate system.

Once a best fit transform has been identified, the mathematicalrelationship between the display-based coordinate system and thecamera-based coordinate system can be determined. In other words, the 3Dposition of the camera relative to the display screen 123 and theorientation of the camera axis can be determined (FIG. 4, step 407),since every position and/or orientation of the camera (apart from thosepositions where the display screen is not visible) will result in aunique image of the display screen 123 in the camera 207.

Referring to FIG. 5 a, a test image is generated by applying ahypothetical transform T to the current display image displayed on thedisplay screen 123. This test image will be a very close approximationto the way the camera image should look assuming T was the correct rigidbody transform that would transform the display image to look like thecamera image.

An initial hypothesis is made (step 501) regarding the relationship ofdisplay screen and camera and represented by the rigid body transform Tmapping points in the display-based coordinate system to thecamera-based coordinate system. The information from step 413 regardingthe history of the position and/or orientation of the camera relative tothe display screen 123 provides important clues and may be used informing the hypothesis. For example, the last position of the cursor canbe used in the initial hypothesis. The ‘rate of change of position’, orvelocity, over the last two positions can improve the estimate and theacceleration can improve the estimate further still. Where no suitablehypothesis exists a default relationship of perpendicular to the displayscreen may be assumed as a starting point.

A colour band c is selected (step 503) from one of the available colourbands. Then, all values in an array I_(ctest) are set to zero and thepixel (0,0) displayed on the display screen 123 is selected (step 505).The coordinates (x_(s),y_(s)) of the point mapping to this pixel arethen calculated (step 506) using relation 1 and this point is thentransformed first from the display-based coordinate system into thecamera-based coordinate system using relation 3 (step 507) andsubsequently from the camera-based coordinate system to the image-basedcoordinate system using relations 4 & 5 (step 509). After that theseimage-based coordinates are converted into image pixel positions(x_(ip),y_(ip)) using relation 2 (step 510). It may be necessary toround the image pixel positions to the nearest integer value (step 511).Then the sample value of the display intensity functionS_(c)(x_(sp),y_(sp)) is acquired from the array S_(csample) (step 513).This value is added at position (x_(ip),y_(ip)) to the running averagein the array I_(ctest) storing the predicted image intensity values. Arunning average is needed since the array I_(ctest) is the same size asI_(csample) and is hence smaller than S_(csample). The value stored inany one position in I_(ctest) may therefore be the average value of anumber of display pixels that map to the image pixels represented bythat position.

Tests are performed (steps 517 and 519) to check whether the intensitycontribution of all display pixels has been accounted for in I_(ctest).If not then steps 506 to 515 are repeated for the next display pixel. Ifall display pixels have been accounted for then a further test iscarried out (step 521) to determine whether or not I_(ctest) has beenestimated for all colour bands c. If not then steps 503 to 519 arerepeated for the next colour band.

The test image generated as described above and stored in the arrayI_(ctest) and the camera image captured by the camera and stored in thearray I_(csample) can be compared to form a real-valued numberindicating the discrepancy between the images. This discrepancy can beminimised through iterative testing-according to any well-known errorminimisation algorithm.

Referring to FIG. 5 b, the procedure to compare the test image andcamera image in order to minimise the discrepancy (and hence determinethe transformation parameters) will now be described in greater detail.

Initially, the error is set to zero and the index numwindows isinitialised (e.g. at zero) (step 530). Then a colour band c is chosen(step 531) from one of the available colour bands. Referring now to FIG.6 c the real image (and similarly, but not shown, the test image) aresplit up into (w×w) pixel sized windows 580. Each filter window overlapsthe neighbouring filter windows by s pixels in both the horizontal andvertical directions and s will be referred to as the filter window step.Smaller filter windows provide a more accurate solution but involve moreprocessing. Good compromise values are w=128 and s=32.

Referring once again to FIG. 5 b, the indexes u and v are initialised(e.g. at zero). Then the sum of the intensity contributions stored inI_(csample) of all the pixels (w×w) in the first real image filterwindow is calculated and stored as a variable I_(csamplesum) (step 535)and the sum of the intensity contributions stored in I_(ctest) of allthe pixels (w×w) in the first test image filter window is calculated andstored as a variable I_(ctestsum) (step 537).

A running total of the squared_error is calculated (step 539) using theformulasquared_error=squared_error+(I_(csamplesum)−I_(ctestsum))²and then the index numwindows is advanced by one (step 541). After thattests are performed (steps 543 and 545) to determine whether thecontribution of all the pixels in the two pixel images (the camera imageand test image) have been accounted for in the error calculation. If notthen steps 535 to 541 are repeated for the next filter window. Otherwisea further test is performed (step 547) to determine whether or not allcolour bands c have been taken into account. If not then steps 531 to545 are repeated. If all colour bands have been accounted for theaverage error is calculated (step 549) from the running totals of theerror and the index numwindows such that:mean_squared error=squared_error/numwindows

The root mean squared_error is then calculated (step 551). The RMS errorrepresents the discrepancy between the two images for the transformestimated in step 501. This RMS error along with other RMS errorsrepresenting the discrepancy between the two images for a differenttransform T are then iteratively tested according to any well-knownerror minimisation algorithm (e.g. gradient descent) to determine whichtransform T gives the minimum discrepancy.

When the minimum RMS error is found the (known) transformationparameters (including the transform T) that map the display image in thedisplay-based coordinate system to the test image in the image-basedcoordinate system are assumed to be the (unknown) transformationparameters that map the display image in the display-based coordinatesystem to the camera image in the image-based coordinate system. Hencethe 3D relationship that defines the position and/or orientation of thecamera 207 (and hence the pointer device 103) relative to the displayscreen 123 is derived (FIG. 4, step 407).

If, however, at this stage the error has not been minimised, thehypothesis of the transform T is refined and retested according to theforegoing procedure.

In a preferred embodiment, rather than starting with an estimate for thetransform T as in step 501 in FIG. 5 a, an attempt is made to match fourpoints that are visible in the display image to four correspondingpoints in the camera image. Since for four correspondences relations 3,4 and 5 above simplify to a system of 8 equations in 8 unknowns,knowledge of these four corresponding points enables an exact solutionfor the transform T to be found.

The process of matching corresponding points and deriving a solution fortransform T will now be described. In this preferred embodiment, regionsof homogeneous colour are identified in order to find correspondingpoints on the borders of these regions. However, those skilled in theart will recognise that such regions are only an example of a featurethat can be detected in the camera image. Any other feature that can bedetected in the camera image despite the transformation which thedisplay image goes through when captured by the camera could be used.

The first stage in the process is to assign each display pixel and eachimage pixel to exactly one identifiable region, based on the colourvalue of the pixel which, it will be remembered, is a sample of thecolour function at that pixel. At this point it is worth defining twoadditional arrays: S_(region) and I_(region) relating to the displaypixels and image pixels respectively. These 2D arrays store the identityof the region to which pixels are assigned (or store an indication thatpixels have yet to be assigned to a region). For example, the regionidentifer R1 stored at position (34,78) in array S_(region) indicatesthat display pixel (34,78) has been assigned to region R1. As a furtherexample, 0 (zero) stored at position (10,23) in array I_(region)indicates that image pixel (10,23) remains unassigned.

The assigning of the display pixels to regions is achieved as follows. Adisplay pixel (x_(sp),y_(sp)) that has yet to be assigned to a region isassigned to a region R1, where R1 is a unique identifier for this newregion. The region identifier R1 is therefore stored in the arrayS_(region) at position (x_(sp),y_(sp)). The colour value for pixel(x_(sp),y_(sp)) is then read from the array S_(colour). The neighboursof pixel (x_(sp),y_(sp)) (i.e. pixels (x_(sp)−1,y_(sp)),(x_(sp)+1,y_(sp)), (x_(sp),y_(sp)−1) and (x_(sp),y_(sp)+1)) are thenalso assigned to region R1 (assuming they have not been assigned toanother region already) if their colour values are the same as (orwithin a predetermined threshold limit of) the colour value of pixel(x_(sp),y_(sp)). If any of the neighbours of pixel (x_(sp),y_(sp)) has acolour value that does not meet this criteria then it is leftunassigned. This process of checking unassigned neighbouring pixels andeither assigning them to region R1 or leaving them unassigned continuesuntil the boundaries of the region R1 have been found. Then one of thepixels left unassigned (preferably the closest unassigned pixel to pixel(x_(sp),y_(sp)) is assigned to a second region R2 and the process isrepeated. Ultimately, each display pixel will be assigned to exactly oneidentifiable region.

By counting the number of occurrences of each region identifier (e.g.R1, R2 . . . ) in the array S_(region), it is possible to identify thelargest region in the image. It is then possible to identify all thepixels on the border of this largest region. This can be achieved byidentifying all pixels assigned to the largest region which have atleast one neighbouring pixel which has been assigned to a differentregion. The curvature of the border at each of these border pixels (i.e.the rate of change of the slope of the border at each of these pixellocations) can then be calculated and the four corners of the region(i.e. points of high curvature where the curvature is above apredetermined threshold value) can be identified. The result is a set ofunique, identifiable points: the four corners of the largest region inthe display image.

The assigning of the image pixels to regions and the derivation of thecorner points of the largest region in the camera image is achieved in asimilar way to that described above in relation to the display pixels.

The next stage in the process is to match the four corners of thelargest region in the display image to the four corresponding corners ofthe largest region in the camera image. Using the pixel coordinates ofthese four corresponding pairs of points, a solution for the transform Tcan then be found by solving the equations that result from thesimplification of relations 3, 4 and 5 above.

The above preferred embodiment utilises a process known in the art asfeature matching. The process described above is a simple example ofsuch a process and those skilled in the art will recognise that a moresophisticated approach is possible.

Given that in general the transform T can translate any point in thedisplay-based coordinate system into a point in the camera-basedcoordinate system, it can therefore translate the display-basedco-ordinates of the corners of the display screen into positions incamera-based coordinates (FIG. 6, steps 601 and 603).

A test is then carried out (step 605) to check whether the coordinatesof 3 corners of the display screen have been translated into cameraspace. If not steps 601 to 603 are repeated.

Three points in a Euclidean space define a plane and hence once thecoordinates of 3 corners of the display screen have been translated intocamera-based coordinates, the formula for the plane of the displayscreen z_(s)=0 can be determined in terms of camera-based coordinates(step 607). After that the intersection point between the linex_(c)=0,y_(c)=0 (the axis of the camera's view projecting from thecentre of the camera's aperture 209 at [0,0,0] along the z_(c)-axis),with the plane of the display screen can be calculated (step 609 andFIG. 4, step 409). Since rigid body transformations are reversible thispoint is then translated back into display-based coordinates (step 611)determining the position which the cursor should adopt within thedisplay-based coordinate system. This will be of the form(x_(s),y_(s),0) since the z_(s)-component of any point on the displayscreen plane is at z_(s)=0. After that these display-based coordinatesare converted into display screen pixel positions (x_(sp),y_(sp)) usingrelation 1 (step 612). It may be necessary to round the display screenpixel positions to the nearest integer value (step 613). Finally thesoftware controls the computer to position the cursor at(x_(sp),y_(sp),0) (step 615 and FIG. 4, step 411).

This completes a single interaction of the display location calculationalgorithm which is iterated to maintain fine control of the cursor bythe movement of the pointer device.

Referring once again to FIG. 1, the user 101 points the pointer device103 in such a way that its longitudinal axis intersects with the displayscreen 123. The user 101 can then control the cursor that appears on thedisplay screen by moving the pointer device 103 such that it points atthe part of the display screen where the user desires the cursor to be.The user 101 can use the buttons 211 a and 213 a to interact with theoutput of the computer 111 that is displayed on the display screen 123.In this way, an interface arrangement is provided between the user andthe computer. For example, the user could point to different iconswithin the display. They could highlight specific text in a document.They could then even alter the appearance of the text by underlining it,making it bold or putting it in italics. They could operate pull downmenus in order to run different programs or view files. They couldchange slides in a digital slide presentation. They could click onhyperlinks in an internet or intranet document. It will be clear tothose skilled in the art that many more functions are available to theuser of such a pointer device.

It will be apparent from the foregoing description that manymodifications or variations may be made to the above describedembodiments without departing from the invention. Such modifications andvariations include:

For a 1024×768, 24-bit true colour display screen image, the memoryoccupied is nearly 2.4 megabytes. Processing all this data is intensive.In order to optimise the processing of the display screen image it maynot be refreshed in every cycle of the display location calculationalgorithm. The pre-processed data from the last cycle will often besufficient and this avoids the intensive calculations required to handlethe large sets of samples. Furthermore, it will be noticed that theresolution of the image captured by the camera is a great deal lowerthan the display screen resolution. This difference permits thepre-processing of the display screen image data to a reduced resolutionin order to minimise the number of samples manipulated later in thealgorithm, without a substantial loss of accuracy. This pre-processingcan be done by any well known technique such as bilinear interpolation.To reduce processing time still further the pointer device can bedisabled when not pointing towards the display screen. This could bedone by adding another button to the pointer device housing which mustbe depressed when operation of the device is required and can bereleased in order to disable the device.

A further way to reduce processing requirements would be to take samplesfrom the screen instead of taking the whole screen's information. Thisapplies both to the generation of the test image (i.e. only taking asubset of available pixels from the display and mapping them to the testimage), and in terms of the tests for correlation with the real image(only trying to correlate a subset of the pixels, since the others areempty—not having been mapped in the first place).

If the transformation for a very basic position and orientation of thepointer device relative to the screen was known then in the initialstages of operation the pointer device could be held in that position inorder to provide the computation means with an accurate initialhypothesis for the transformation and hence reduce the processingrequirements still further. Such a calibration position could, forexample, be perpendicular to dead centre of the display at a distance of300 mm. In other embodiments, known artificial and distinguishablefeatures (e.g. a known shape of a known colour) could be introduced intothe display to aid calibration. This is particularly useful when thepointer device is being used to point at complex display data such astelevision pictures.

Although in the above described embodiments the pointer device was usedto control a cursor in two dimensions, it is possible to create a 3Dpointer since the 3D position of the camera relative to the displayscreen and the orientation of the camera axis is known. The 3D pointercould operate in such a way that every movement of the pointer devicecan be traced into a 3D model in order to control a 3D interface such asthose found in computer aided design (CAD) packages.

Although in the above described embodiments a cursor was included at thecalculated display location, it is also a possibility not to include acursor. Such a circumstance may, for example, arise in a computer aideddesign (CAD) package where the user points the device at the display andmoves the device in order to change the perspective view of the objectthat is being designed/drawn. For example, a user may want to look atthe object from underneath and by moving the device in a downwarddirection could rotate the view to the one that is desired.

Although in the above described embodiments the position and orientationof the pointer device relative to the display was calculated, it is alsopossible that only the position or the orientation is calculated. In thecase of position the user of the pointer device may keep the deviceperpendicular to the display at all times and therefore the orientationdoes not change. Only the position would change. In the case oforientation the user may keep the device in the same position but merelychange the direction in which they point it and hence only change itsorientation.

Although in the above described embodiments the position and orientationof the pointer device relative to the display was used to control thecursor, it is also possible that only the position or the orientation isused. For example, in the case of a CAD package when a user wants todraw a straight line, it is common that all they have to do is click atthe two end points of the line. A user of the device cold click at oneend point and move the device in order to move the cursor to the otherend point and click again. The orientation of the device would not berelevant in this case. The situation described above in relation tochanging the perspective view of an object being designed/drawn is acase when only the orientation of the device is relevant.

Although in the above described embodiments the display was generated byan LCD monitor, it is possible for the display to be generated by an LCDor digital light processing (DLP) projector and projected onto a largeprojection screen or even a suitable wall. This is particularly usefulin the context of a user needing to point to a display that is viewed bya large audience. The camera would still capture an image of the displaywhen the user points the device at the screen or wall and the image dataacquired in this way can then be used with the display data generated bythe machine (to which the projector is attached) to calculate the pointof intersection.

Although in the above described embodiments the pointer device was usedto control a machine, it is also possible to use it merely to point toan object in the display. In this case no activation means would beneeded. This is again suitable in the context of a user giving apresentation to an audience.

Although in the above embodiments a digital video camera was used, it ispossible that a camera which takes a snap shot of the display could beused instead. The snap shot of the image would provide the image data,which together with the display data can be used for image registration.The snap shot would be updated with a new snap shot as often as possibleusing any spare processing power.

Although in the above described embodiments the camera was a manual,fixed focus camera, it is also possible to use a camera containing anauto-focus mechanism. The distance of operation criteria could then berelaxed as the camera would constantly maintain an adequate focus. Thisdistance criteria could also be relaxed in the case of a fixed focuslens if some means were provided for entering the magnification of theoptical system into the algorithm.

Although in the above described embodiments the pointer device only usedone camera, it is possible to use more than one camera. In this way theintersection point of the axis of each camera and the screen may beseparately identified and the combination of all the camera's positionsand orientations taken as the control input. For example, it would bepossible to create a glove with a camera in the end of each finger. Thusa natural form of physical manipulation can be used, e.g. gripping anddropping, using the five finger points on the screen.

In alternative embodiments, the pointer device could contain somecomputational means (in the form of a CPU, memory and storage devices)so that some or all of the processing requirements can be carried out bythe computational means within the pointer device. For example, thepointer device can be combined with a handheld computing device such ass personal digital assistant (PDA).

1. Display location calculation means comprising: a display generatorarranged in operation to generate a display in response to display datagenerated by a machine; a pointer device carrying a camera operable togenerate image data representing at least part of the scene within thefield of view of the camera, which part, includes an image of at least aportion of the display; computation means arranged in operation to:receive said image data; receive said display data; calculate, from saidimage data and said display data, the position and/or orientation ofsaid pointer device relative to said display; calculate a displaylocation from said calculated position and/or orientation.
 2. Displaylocation calculation means according to claim 1 wherein said pointerdevice is elongate in shape and wherein said display location is thepoint where the longitudinal axis of said pointer device intersects withsaid display.
 3. An interface arrangement for providing an interfacebetween a user and a machine comprising: display location calculationmeans according to claim 2; wherein said computation means is furtherarranged in operation to control said machine in accordance with theposition of the cursor.
 4. An interface arrangement according to claim 3wherein said machine comprises a computer having a processor and whereinsaid computation means comprises said processor.
 5. An interfacearrangement according to claim 3 wherein said pointer device furthercarries a computer having a processor and wherein said computation meanscomprises said processor.
 6. An interface arrangement according to claim3 wherein said machine comprises a first computer having a firstprocessor and wherein said pointer device further carries a secondcomputer having a second processor and wherein said computation meanscomprises said first processor and said second processor.
 7. Aninterface arrangement according to claim 3 wherein said pointer devicefurther comprises at least one activation means for use in controllingsaid machine.
 8. An interface arrangement according to claim 3 whereinsaid display generator comprises a projector.
 9. An interfacearrangement according to claim 3 wherein said camera is a digital videocamera.
 10. An interface arrangement according to claim 3 wherein saidcamera is a fixed focus camera.
 11. An interface arrangement accordingto claim 10 further comprising an indicator arranged in operation toindicate that the image captured by said camera is focussed.
 12. Aninterface arrangement according to claim 3 wherein said camera has afixed spatial relationship with said pointer device.
 13. Displaylocation calculation means according to claim 1 wherein a cursor isincluded in said display at said display location.
 14. Display locationcalculation means according to claim 13 wherein the position of saidcursor varies in accordance with said calculated position and/ororientation.
 15. A method of calculating a display location said methodcomprising the steps of: i. generating a display in accordance withdisplay data generated by said machine; ii. capturing image datarepresenting at least part of the scene within the field of view of acamera carried by a pointer device wherein at least a portion of saiddisplay is included in said field of view; iii. calculating from saidimage data and said display data the position and/or orientation of saidpointer device relative to said display; iv. calculating an displaylocation from said calculated position and/or orientation.
 16. A methodaccording to claim 15 comprising the additional step of generating acursor in said display at said display location.
 17. A method accordingto claim 16 wherein the position of said cursor varies in accordancewith said calculated position and/or orientation.
 18. A method accordingto claim 17 wherein said pointer device is elongate in shape and whereinsaid display location is the point where the longitudinal axis of saidpointer device intersects with said display.
 19. A method according toclaim 15 comprising the additional step of storing data representing thehistory of the position and/or orientation of said pointer devicerelative to said display and wherein said calculation step (iii) furthertakes into account said data.
 20. Display location calculation meansincluding: a storage medium having recorded therein a computer readableprogram processable to provide an interface between a user and a machinesaid program comprising: display data acquisition code processable toobtain display data representing a display image data acquisition codeprocessable to obtain from a pointer device carrying a camera, imagedata representing at least part of the scene within the field of view ofsaid camera; position/orientation calculation code processable tocalculate from said display data and said image data the position and/orthe orientation of said pointer device relative to said display displaylocation calculation code processable to calculate from said calculatedposition and/or orientation a display location.
 21. A digital datacarrier carrying a computer program of instructions executable by acomputer apparatus to perform the method steps as set out in claim 15.